##Import the EITC Data Using an API
This data was provided by New York State Data.Ny.Gov. The original data set includes information for all counties in New York State. For purposes of this project, we’re only interested in New York City, and the five counties that correspond to each borough.
##Tidy the data
There are several columns that we won’t need for this analysis. The
notes, place_of_residence, and
place_of_residence_sort_order will be dropped because they
either don’t contain any data (i.e. notes) or they’re redundant of other
columns.
Next, we’ll filter this data to only show the counties withing New York City: Bronx, Kings, Manhattan, Queens and Richmond. For consistency across the project, we’ll then convert the county names to their corresponding borough names. There are two that need to change: Kings County corresponds to Brooklyn and Richmond County corresponds to Staten Island.
Next, we’ll clean up the credit_type variable names for
ease of use, then we’ll coerce character columns to numeric as
needed.
Finally, the credit_amount_claimed_in_thousands needs to
be multiplied by 1000 to get the actual dollar amount claimed in each
borough. I’m also going to rename county to
boroughfor consistency across the project.
##Types of Tax Credits
There are two different types of EITC in this data set, with one type having two categories. EITC is offered by both the state and the city, although the state tends to give a higher credit: claimants can file for both of these. The Noncustodial Parent EITC is offered exclusively by the state and claimants can only file for this credit alone. If they claim Noncustodial Parent EITC, it disqualifies them from regular EITC eligibility. The qualification criteria do not overlap but the underlying economic considerations are the same.
Earned Income Tax Credit (EITC) –> For those who have worked and earned income under $57,414. It’s designed primarily for low-to-moderate income individuals and families to get a tax break. The amount of the credit is dependent on income. There are two different forms of this credit: one for state and one for the city. Eligbility criteria is the same.
Noncustodial Parent EITC –> For those who meet the income threshold and have a child, but do not have custody of the child. This is exclusively offered by the state.
##Overall Trends
First, we’ll create a graph to show the number of claims in each borough from 2006 onward. Although the data goes back as far as 1994, the City EITC wasn’t introduced until 2004 and the Noncustodial Parent EITC wasn’t introduced until 2006. For consistency, we’ll look at 2006 onward for an idea of the overall number of claims made each year in each borough.
##Proportion of Each Borough Filing a Claim
The graph above shows the overall trend in the number of claims filed per borough, but this relationship scales with population of each borough. For instance, Brooklyn is the most populace borough, so it logically follows that the most claims were filed in that borough.
Instead of looking at the trend, it would be helpful to understand what percentage of each borough’s population files a claim. Each year, The US Census Bureau extrapolates estimated county populace, basing their projections on the most recent Census data and vital statistics. We’re going to pull these estimates into a dataframe, merge it with our tax data, then find the proportion of claimants in each borough.
We’ll limit the time frame from 2016-2018, the same years that the
Greenspace data were collected. The code book for this data
can be accessed here.
##Average Claim Amount
Next, we’ll look at the average claim amount 2016-2018 for each
borough, stratifying by credit type. Since our primary
Greenspace data set only includes data from 2016-2018,
we’re going to limit our analysis to these years. Since the
qualification criteria is the same for City and State EITC, we’re going
to add the amounts to create a single category called
EITC.
##Use ANOVA for these